Motion compensation for fine-grain scalable video

ABSTRACT

A fine-grain scalable video data apparatus, system, method and data structure is disclosed. An encoder ( 110 ) for encoding input video data as minimum bitrate macroblock data to produce DCT data having DCT coefficients representing a minimum bitrate version of the macroblock data. The encoder ( 110 ) also encodes the input video data as intermediate bitrate macroblock data to produce DCT data having DCT coefficients representing an intermediate bitrate version of the macroblock data. An adaptive motion compensator ( 132 ) (whether incorporated within the encoder or externally) communicates with the encoder for predicting whether a decoded version of the intermediate bitrate macroblock data has an accumulated predicted error frame energy exceeding a maximum threshold ( 228 ).

CROSS-REFERENCES

[0001] This application claims the benefit of U.S. Provisional PatentApplication Serial No. 60/297,330, filed Jun. 11, 2001 (Attorney DocketNo. PU010128), which is incorporated by reference herein in itsentirety.

FIELD OF THE INVENTION

[0002] The present disclosure is directed towards fine-grain scalable(“FGS”) video encoders, and in particular, to an adaptive motioncompensation technique for coding of video data using fine-grainscalability.

BACKGROUND OF THE INVENTION

[0003] Digital video data is often processed and transferred in the formof bit streams. A bit stream is fine-grain scalable (“FGS”) if the bitstream can be decoded at any one of a finely spaced set of bitratesbetween pre-determined minimum and maximum rates. Unfortunately, thistype of scalability typically results in a coding efficiency that issignificantly less than that of a non-scalable video coder-decoder(“CODEC”).

[0004] The Moving Picture Experts Group (“MPEG”) has adopted standardsfor streaming video. The MPEG-4 standard includes a mode for FGS video.In MPEG-4 FGS video, the current frame is predicted using theminimum-bitrate reconstructed version of the previous frame. WithMPEG-4, if a higher-bitrate version of the previous frame were used forprediction, this would lead to prediction drift any time the bit streamwas decoded at a rate lower than the rate used for prediction in theencoder. The prediction drift is caused by the difference between theencoder's reference frame and the decoder's reference frame.Accordingly, it is desirable to improve the CODEC efficiency over thatof typical FGS schemes such as, for example, the FGS video schemeadopted in the MPEG-4 standard.

SUMMARY OF THE INVENTION

[0005] These and other drawbacks and disadvantages of the prior art areaddressed by an apparatus and method for motion compensation offine-grain scalable video data. Fine-grain scalable video data isgenerated by an encoder for encoding input video data as minimum bitratemacroblock data to produce Discrete Cosine Transform (“DCT”) data havingDCT coefficients representing a minimum bitrate version of themacroblock data, and for encoding the input video data as intermediatebitrate macroblock data to produce DCT data having DCT coefficientsrepresenting an intermediate bitrate version of the macroblock data, andan adaptive motion compensator in signal communication with the encoderfor predicting whether a decoded version of the intermediate bitratemacroblock data will have an accumulated predicted error frame energyexceeding a maximum threshold.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] The present disclosure teaches an efficient approach to motioncompensation for fine-grain scalable video in accordance with thefollowing exemplary figures, in which:

[0007]FIG. 1 is a block diagram of a fine-grain scalable (“FGS”) encoderwith only base layer motion compensation;

[0008]FIG. 2 is a block diagram of a fine-grain scalable (“FGS”) encoderwith adaptive motion compensation according to a preferred embodiment ofthe present invention; and

[0009]FIG. 3 is a flow diagram for the adaptive motion compensation ofFIG. 2, in accordance with the principles of the present invention.

DETAILED DESCRIPTION

[0010] A video data coder-decoder (“CODEC”), in accordance with theembodiments of the present invention described herein, employs discretecosine transform (“DCT”) based manipulation of video data. The videodata is preferably organized as macroblocks.

[0011] MPEG-4 fine-grain scalability (“FGS”) uses a minimum-bitrateprevious frame for motion compensation. In accordance with theprinciples of the invention, the encoder chooses between theminimum-bitrate previous frame and a higher-bitrate previous frame, on amacroblock basis. The encoder tracks the accumulated prediction drift ateach frame.

[0012] For a given macroblock, if using the higher-bitrate previousframe for motion compensation would result in a prediction drift energyabove a maximum limit, the encoder chooses the minimum-bitrate previousframe to predict that macroblock. Otherwise, the encoder chooses thehigher-bitrate previous frame to predict the macroblock. The encodersets a bit (flag) in the coded macroblock to convey to the decoder whichversion of the previous frame was used for the prediction.

[0013] As shown in FIG. 1, an FGS encoder 10 can be functionally brokenup into a Base Layer portion 11 and an Enhancement Layer portion 33. TheBase Layer portion 11 includes an input terminal 12 that is coupled insignal communication to a positive input of a summing block 14. Thesumming block 14 is coupled, in turn, to a function block 16 forimplementing a discrete cosine transform (“DCT”). The block 16 iscoupled to a function block 18 for implementing the quantizationtransform Q. The function block 18 is coupled to a function block 20 forimplementing variable length coding (“VLC”). The block 18 is furthercoupled to a function block 22 for implementing the inverse quantizationtransform Q⁻¹.

[0014] The block 22, in turn, is coupled to a function block 24 forimplementing an inverse discrete cosine transform (“IDCT”). The block 24is coupled to a positive input of a summing block 26, which is coupledto a block 28 for implementing a frame buffer. The block 28 is coupledto a function block 30 for performing motion estimation. The inputterminal 12 is also coupled to the block 30 for providing an input videosignal. The frame buffer 28 and the motion estimation block 30 are eachcoupled to a block 32 for performing motion compensation. The functionblock 32 is coupled to a negative input of the summing block 14 and alsopassed to a positive input of the summing block 26.

[0015] The enhancement layer portion 33 includes a summing block 34having its positive input coupled to the output of the DCT 16, and itsnegative input coupled to the output of the inverse quantization block22. The output of the block 34 is coupled to a function block 36 forimplementing bit-plane coding. The output of the bit-plane coder 36 iscoupled, in turn, to a function block 38 for implementing variablelength coding (“VLC”).

[0016] In operation, the FGS encoder of FIG. 1 uses only the base layerfor prediction, as is done in MPEG-4 FGS. The base layer encoder 11 issimply a single layer DCT-based motion-compensated encoder. Initially,input video is motion-compensated, using motion vectors obtained fromthe motion estimation process. Then the prediction error is transformedusing the DCT, and the resulting DCT coefficients are quantized andentropy coded using a variable-length code. To reconstruct the baselayer frame, initially, inverse quantization is performed, then an IDCTis performed. The prediction that was subtracted in the motioncompensation process is then added back in, and the reconstructed frameis stored in the frame buffer in order to be used as a reference forfuture pictures.

[0017] An initial step in encoding the enhancement layer is to subtractthe inverse quantized DCT coefficients in the base layer from theunquantized coefficients. The bit planes are then scanned one at a timeand variable-length coded. The decoder will decode some subset of thesebitplanes according to the bitrate available at the time of decoding.

[0018] Turning to FIG. 2, an exemplary FGS encoder 110, in accordancewith the principles of the present invention, can be conceptually brokenup into a Base Layer portion 111 and an Enhancement Layer portion 133.The Base Layer portion 111 includes an input terminal 112 that iscoupled in signal communication to a positive input of a summing block114. The summing block 114 is coupled, in turn, to a function block 116for implementing a discrete cosine transform (“DCT”). The block 116 iscoupled to a function block 118 for implementing the quantizationtransform Q. The function block 118 is coupled to a function block 120for implementing variable length coding (“VLC”). The block 118 isfurther coupled to a function block 122 for implementing the inversequantization transform Q⁻¹.

[0019] The block 122, in turn, is coupled to a function block 124 forimplementing an inverse discrete cosine transform (“IDCT”). The block124 is coupled to a positive input of a summing block 126, which iscoupled to a block 128 for implementing a frame buffer. The block 128 iscoupled to a function block 130 for performing motion estimation. Theinput terminal 112 is also coupled to the block 130 for providing aninput video signal. The frame buffer 128 and the motion estimator 130are each coupled to a function block 132 for performing adaptive motioncompensation. The function block 132 is coupled to a negative input ofthe summing block 114 and also passed to a positive input of the summingblock 126.

[0020] The enhancement layer portion 133 includes a summing block 134having its positive input coupled to the output of the DCT 116, and itsnegative input coupled to the output of the inverse quantization block122. The output of the block 134 is coupled to a function block 136 forimplementing bit-plane coding. The output of the bit-plane coder 136 iscoupled, in turn, to a function block 138 for implementing variablelength coding (“VLC”). The output of the bit-plane coder 136 is alsocoupled to a positive input of a summing block 139 comprised by the baselayer portion 111.

[0021] Returning to the base layer portion 111, the summing block 139has another positive input coupled from the output of the inversequantization block 122. The output of the summing block 139 is coupledto a function block 140 for implementing another IDCT. The IDCT 140 iscoupled to a positive input of a summing block 142, which has anotherpositive input coupled from the output of the adaptive motioncompensator 132. The output of the summing block 142 is coupled to anenhancement layer frame buffer 144. The enhancement layer frame buffer144 is coupled, in turn, to the adaptive motion compensator 132. A driftframe buffer 146 is coupled in bi-directional signal communication withthe adaptive motion compensator 132.

[0022] In operation, the FGS encoder of FIG. 2 implements a preferredFGS method, in accordance with the principles of the present invention.A significant difference between the FGS encoder of FIG. 2 and that ofFIG. 1 is that in the encoder of FIG. 2, the output of the bit planecoding for a subset of the bit planes in the enhancement layer is addedto the inverse-quantized DCT coefficients in the base layer, as aninitial step in obtaining the reconstructed enhancement layer framef_(mid). An IDCT is then performed and the prediction from the motioncompensation step is added back in. The result, f_(mid), is stored inthe enhancement layer frame buffer. The reconstructed base layer frame,f_(min), is stored in the base layer frame buffer. In the adaptivemotion compensation method, apparatus, and system which incorporates theprinciples of the present invention, the base layer and enhancementlayer predictions are read, the accumulated prediction drift is computedassuming the enhancement layer prediction is used, and the appropriateprediction is selected. If the enhancement layer prediction is selected,the accumulated prediction drift is updated and written to the driftframe buffer.

[0023] Referring to FIG. 3, the process of adaptive motion compensationreferred to in FIG. 2 and accompanying description, and in accordancewith the principles of the present invention, is illustrated as method200. Method 200 begins at start block 210, and proceeds to decisionblock 212. At decision block 212, it is determined whether the currentimage begins a new group of pictures (“GOP”). If the current image doesbegin a new GOP, control passes to function block 214 to reset theaccumulated predicted error frame, F_(d), to zero. After block 214, orif a new GOP was not detected at block 212, control passes to functionblock 216, which chooses an intermediate bitrate R_(mid), where R_(mid)is any value between a minimum bitrate R_(min) and a maximum bitrateR_(max). For exemplary purposes, R_(mid) may be considered to be halfwaybetween R_(min) and R_(max). Block 216 then passes to function block218, which fetches a macroblock from the frame, F_(min), correspondingto the previous frame coded at the minimum bitrate R_(min). Block 218then passes to function block 220, which fetches a macroblock from theframe, F_(mid), corresponding to the previous frame coded at theintermediate bitrate R_(mid). Block 220 then passes to function block222, which fetches a macroblock from the frame, F_(d), corresponding tothe previous frame accumulated prediction error.

[0024] Function block 222 passes control to function block 226. Block226 computes the energy E of the intermediate bitrate prediction P_(mid)relative to the accumulated prediction error F_(d), and passes todecision block 228. Decision block 228 determines whether the computedenergy E is greater than a threshold E_(max), and if it is not greater,passes control to function block 230. Function block 230 chooses theintermediate bitrate prediction P_(mid), and passes to function block232. Function block 232 updates the accumulated prediction error frameF_(d), and passes to a return block 236. At decision block 228, if theenergy E is greater than the threshold E_(max), then control is passedto function block 234. Block 234 chooses the minimum bitrate predictionP_(min), and passes to return block 236.

[0025] In operation of the present motion compensation method, theminimum and maximum bitrates for the encoded data stream are R_(min) andR_(max) respectively. R_(mid) is any intermediate bitrate betweenR_(min) and R_(max). Thus, to encode a macroblock, the encoder fetches amotion-compensated block from the previous frame at R_(min) and amotion-compensated block from the previous frame at R_(mid).

[0026] The encoder also fetches another block from a frame representingthe accumulated prediction drift error. The accumulated prediction drifterror frame is reset to zero at the beginning of every group of pictures(“GOP”). The blocks representing the minimum-rate prediction,intermediate rate prediction, and accumulated prediction drift error arereferred to as P_(min), P_(mid), and P_(d), respectively. In order todetermine which prediction to use, the encoder computes the energy ofthe prediction drift error for the P_(mid) prediction. If the energy “E”is defined as a function measuring the energy of a block and if E_(max)is the maximum permitted drift energy threshold, then the appropriateprediction is selected as follows:

If E(P _(d) +P _(min) −P _(mid))>E _(max)  (1)

Prediction=P_(min)

[0027] Else

Prediction=P_(mid)

P _(d) =P _(d) +P _(min) −P _(mid)

[0028] End If

[0029] In this exemplary embodiment, a bit is included in the macroblockheader to convey to the receiving decoder which prediction block wasselected. In the decoder, two decoded versions of each frame, F_(min)and F_(mid), respectively, are written into memory to be used asreference frames. The frame F_(min) represents the frame at the minimumbitrate, while the frame F_(mid) represents the frame at theintermediate bitrate. If the frame is decoded at a bitrate lower thanR_(mid), then F_(mid) is approximated using the decoded frame at thatlower bitrate.

[0030] These and other features and advantages of the present disclosuremay be readily ascertained by one of ordinary skill in the pertinent artbased on the teachings herein. It is to be understood that the teachingsof the present disclosure may be implemented in various forms ofhardware, software, firmware, special purpose processors, orcombinations thereof.

[0031] Most preferably, the teachings of the present disclosure areimplemented as a combination of hardware and software. Moreover, thesoftware is preferably implemented as an application program tangiblyembodied on a program storage unit. The application program may beuploaded to, and executed by, a machine comprising any suitablearchitecture. Preferably, the machine is implemented on a computerplatform having hardware such as one or more central processing units(“CPU”), a random access memory (“RAM”), and input/output (“I/O”)interfaces. The computer platform may also include an operating systemand microinstruction code. The various processes and functions describedherein may be either part of the microinstruction code or part of theapplication program, or any combination thereof, which may be executedby a CPU. In addition, various other peripheral units may be connectedto the computer platform such as an additional data storage unit and aprinting unit.

[0032] It is to be further understood that, because some of theconstituent system components and methods depicted in the accompanyingdrawings are preferably implemented in software, the actual connectionsbetween the system components or the process function blocks may differdepending upon the manner in which the present disclosure is programmed.Given the teachings herein, one of ordinary skill in the pertinent artwill be able to contemplate these and similar implementations orconfigurations of the present disclosure.

[0033] Although the illustrative embodiments have been described hereinWith reference to the accompanying drawings, it is to be understood thatthe present disclosure is not limited to those precise embodiments, andthat various changes and modifications may be effected therein by one ofordinary skill in the pertinent art without departing from the scope orspirit of the present disclosure. All such changes and modifications areintended to be included within the scope of the present disclosure asset forth in the appended claims.

What is claimed is:
 1. A fine-grain scalable video data apparatuscomprising: an encoder (110) for encoding input video data as minimumbitrate macroblock data to produce DCT data having DCT coefficientsrepresenting a minimum bitrate version of the macroblock data, and forencoding the input video data as intermediate bitrate macroblock data toproduce DCT data having DCT coefficients representing an intermediatebitrate version of the macroblock data; and an adaptive motioncompensator (132) in signal communication with the encoder forpredicting whether a decoded version of the intermediate bitratemacroblock data has an accumulated predicted error frame energyexceeding a maximum threshold (228).
 2. The apparatus as defined inclaim 1 wherein the input video data is fine-grain scalable between theminimum bitrate and a maximum bitrate, the intermediate bitrate fallinganywhere therebetween.
 3. The apparatus as defined in claim 1 whereinthe encoder comprises the adaptive motion compensator.
 4. The apparatusas defined in claim 1, the encoder comprising: an enhancement layerframe buffer (144); the adaptive motion compensator (132) in signalcommunication with the enhancement layer frame buffer; and a drift framebuffer (146) in signal communication with the enhancement layer framebuffer.
 5. The apparatus as defined in claim 1, the adaptive motioncompensator comprising: a group-of-pictures detector for resetting adrift frame buffer for each new group-of-pictures (212); an energy unit(226) for computing the energy of an intermediate-rate predictionrelative to the drift frame buffer (226); and a prediction unit forselecting one of the intermediate-rate prediction and a minimum-rateprediction for each block of pixels to be predicted from the data of theprevious picture using the motion vectors for the macroblock data (228).6. A fine-grain scalable video data apparatus for receiving encodedvideo macroblock data wherein each macroblock is represented by one ofDCT coefficients representing a minimum bitrate version of themacroblock data and DCT coefficients representing an intermediatebitrate version of the macroblock data, the apparatus comprising adecoder for decoding one of the intermediate and minimum bitrate encodedDCT data for each macroblock received from the encoder to producereconstructed macroblock data responsive to a predicted energy of anaccumulated predicted error frame.
 7. A method for performing fine-grainscalable video data operations, the method comprising: encoding inputvideo data as minimum bitrate macroblock data to produce DCT data havingDCT coefficients representing a minimum bitrate version of themacroblock data; encoding the input video data as intermediate bitratemacroblock data to produce DCT data having DCT coefficients representingan intermediate bitrate version of the macroblock data; and compensatingthe encoded data to predict whether a decoded version of theintermediate bitrate macroblock data has an accumulated predicted errorframe energy exceeding a maximum threshold.
 8. A method as defined inclaim 7, further comprising decoding one of the intermediate and minimumbitrate encoded DCT data from the encoded input video data to producereconstructed macroblock data responsive to the predicted energy of theaccumulated predicted error frame.
 9. A method as defined in claim 7wherein the input video data is fine-grain scalable between the minimumbitrate and a maximum bitrate, the intermediate bitrate falling anywheretherebetween.
 10. A method as defined in claim 7, further comprisingcompensating enhancement layer block data with enhancement layer data ofa previous picture and motion vectors for the macroblock data to producecompensated enhancement layer block data such that DCT is performed withrespect to the compensated enhancement layer block data to produce theenhancement layer DCT data.
 11. A method for receiving encoded videomacroblock data wherein each macroblock is represented by one of DCTcoefficients representing a minimum bitrate version of the macroblockdata and DCT coefficients representing an intermediate bitrate versionof the macroblock data, the method comprising decoding one of theintermediate and minimum bitrate encoded DCT data for each macroblockreceived from the encoder to produce reconstructed macroblock dataresponsive to a predicted energy of an accumulated predicted errorframe.
 12. A program storage device readable by machine, tangiblyembodying a program of instructions executable by the machine to performsteps for performing fine-grain scalable video data operations, thesteps comprising: encoding input video data as minimum bitratemacroblock data to produce DCT data having DCT coefficients representinga minimum bitrate version of the macroblock data; encoding the inputvideo data as intermediate bitrate macroblock data to produce DCT datahaving DCT coefficients representing an intermediate bitrate version ofthe macroblock data; and compensating the encoded data to predictwhether a decoded version of the intermediate bitrate macroblock datawill have an accumulated predicted error frame energy exceeding amaximum threshold.
 13. A program storage device as defined in claim 12,the steps further comprising decoding one of the intermediate andminimum bitrate encoded DCT data from the encoded input video data toproduce reconstructed macroblock data responsive to the predicted energyof the accumulated predicted error frame.
 14. A program storage deviceas defined in claim 12 wherein the input video data is fine-grainscalable between the minimum bitrate and a maximum bitrate, theintermediate bitrate falling anywhere therebetween.
 15. A programstorage device as defined in claim 12, the steps further comprisingcompensating enhancement layer block data with enhancement layer data ofa previous picture and motion vectors for the macroblock data to producecompensated enhancement layer block data such that DCT is performed withrespect to the compensated enhancement layer block data to produce theenhancement layer DCT data.
 16. A program storage device readable bymachine, tangibly embodying a program of instructions executable by themachine to perform steps for receiving encoded video macroblock datawherein each macroblock is represented by one of DCT coefficientsrepresenting a minimum bitrate version of the macroblock data and DCTcoefficients representing an intermediate bitrate version of themacroblock data, the steps comprising decoding one of the intermediateand minimum bitrate encoded DCT data for each macroblock received fromthe encoder to produce reconstructed macroblock data responsive to apredicted energy of an accumulated predicted error frame.
 17. Afine-grain scalable video data system for performing fine-grain scalablevideo data operations, the system comprising: minimum encoding means forencoding input video data as minimum bitrate macroblock data to produceDCT data having DCT coefficients representing a minimum bitrate versionof the macroblock data; intermediate encoding means for encoding theinput video data as intermediate bitrate macroblock data to produce DCTdata having DCT coefficients representing an intermediate bitrateversion of the macroblock data; and compensating means for compensatingthe encoded data to predict whether a decoded version of theintermediate bitrate macroblock data will have an accumulated predictederror frame energy exceeding a maximum threshold.
 18. A system asdefined in claim 17, further comprising decoding means for decoding oneof the intermediate and minimum bitrate encoded DCT data from theencoded input video data to produce reconstructed macroblock dataresponsive to the predicted energy of the accumulated predicted errorframe.
 19. A system as defined in claim 17 wherein the input video datais fine-grain scalable between the minimum bitrate and a maximumbitrate, the intermediate bitrate falling anywhere therebetween.
 20. Asystem as defined in claim 17, further comprising compensating means forcompensating enhancement layer block data with enhancement layer data ofa previous picture and motion vectors for the macroblock data to producecompensated enhancement layer block data such that DCT is performed withrespect to the compensated enhancement layer block data to produce theenhancement layer DCT data.